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46 Abstract 

47 

48 An attractive property of ensemble data assimilation methods is that they provide flow dependent 

49 background error covariance estimates which can be used to update fields of observed variables 

50 as well as fields of unobserved model variables. Two methods to estimate background error 

51 covariances are introduced which share the above property with ensemble data assimilation 

52 methods but do not involve the integration of multiple model trajectories. Instead, ah the 

53 necessary covariance information is obtained from a single model integration. The Space 

54 Adaptive Forecast error Estimation (SAFE) algorithm estimates error covariances from the 

55 spatial distribution of model variables within a single state vector. The Flow Adaptive error 

56 Statistics from a Time series (FAST) method constructs an ensemble sampled from a moving 

57 window along a model trajectory. 

58 

59 SAFE and FAST are applied to the assimilation of Argo temperature profiles into version 4.1 of 

60 the Modular Ocean Model (MOM4.1) coupled to the GEOS-5 atmospheric model and to the 

61 CICE sea ice model. The results are validated against unassimilated Argo salinity data. They 

62 show that SAFE and FAST are competitive with the ensemble optimal interpolation (EnOI) used 

63 by the Global Modeling and Assimilation Office (GMAO) to produce its ocean analysis. 

64 Because of their reduced cost, SAFE and FAST hold promise for high-resolution data 

65 assimilation applications. 

66 
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69 ensemble optimal interpolation 
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71 

72 1 . Introduction 

73 Following a seminal paper by Evensen (1994) introducing the ensemble Kalman filter (EnKF), 

74 ensemble data assimilation (EDA) methods have gained wide acceptance and usage in the 

75 geophysical sciences. While EDA methods differ in terms of the approach used to update or 

76 resample the ensemble of model states, they all require an ad hoc number of concurrent model 

77 integrations to estimate the distribution of background errors. This approach is essentially an 

78 0(w) procedure, where n is the size of the model state vector. In contrast, the original Kalman 

79 (1960) filter algorithm propagates its background error covariance estimates by means of matrix 

80 multiplications of O (n) . Hence, EDA methods are comparably economical from a numerical 

8 1 standpoint. Yet, their cost is significantly higher than that of conventional methods that do not 

82 involve ensemble model integrations. Thus, implementations of EDA methods must 

83 compromise between ensemble size and model resolution. 

84 

85 Because the analysis and error estimates depend on the state of each ensemble member, EDA 

86 methods are flow-adaptive. They also provide estimates of the cross-field covariance between 

87 observed and unobserved model fields that can be used to update unobserved system variables. 

88 For example, ocean sub-surface fields can be updated even if only surface observations are 

89 available. 

90 

91 The purpose of this paper is to introduce two data assimilation algorithms that share the 

92 abovementioned properties of EDA methods but, unlike EDA methods, rely on only a single 

93 model trajectory to estimate the necessary error-covariance information. As such, these methods 

94 obviate the requirement to compromise between ensemble size and model resolution. The Space 

95 Adaptive Forecast-error Estimation (SAFE) algorithm estimates error covariances from the 

96 spatial distribution of model variables in the neighborhood of every model grid cell in a single 

97 background state. Rather, the Flow Adaptive error Statistics from a Time series (FAST) 

98 algorithm estimates covariances from the recent distribution of high-pass filtered lagged 

99 instances of the model state vector sampled along the same trajectory. Because they do not 

100 require multiple integrations of the numerical model, SAFE and FAST are considerably less 

101 resource hungry than typical EDA methods and thus hold promise for high-resolution data 

102 assimilation applications. 

103 

104 The underlying assumption on which SAFE and FAST are based is that errors in the forecasts 

105 used in assimilation are primarily phase errors in space and/or time. For the ocean, this 

106 assumption makes sense as the dominant source of error can be related to errors in surface 

107 forcing, i.e., the timing, intensity, or location of particular atmospheric synoptic events. Thus, 

108 the forecast (or background) errors can be related to the timing or intensity in the propagation or 

109 advection of oceanic anomalies. 

110 

111 The algorithms are outlined in Section 2 and compared to conventional assimilation techniques 

112 in Section 3 where they are applied to the assimilation of Argo temperature (T) profiles into the 

113 OGCM component of the NASA Global Modeling and Assimilation Office (GMAO) Goddard 

114 Earth Observing System (GEOS). Unassimilated Argo salinity (S) observations are used to 

115 validate the assimilation. Conclusions follow in Section 4. 

116 
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2. Assimilation Algorithms 
2.1 Preamble 

Most sequential data assimilation algorithms are inspired by or derived from the Kalman filter 
(Kalman 1960) and involve the following steps, 

x{ = M(x k _ v f k _ { ), (la) 
y k = H k(x k ) + £ k , E(s k s T k ) = R k , (lb) 
K k =P k f H T k [H k P k f H T k+ R k \\ (lc) 

K = x{ + K [y k - H k (x { )], (Id) 

where the subscript k refers to the Ath of a sequence of assimilations, x 1 and x a denote the model 
forecast and analyzed states, M is the model operator, and f k -i represents the forcing between 
times t k -i and t k . The observations, y k , assimilated at time t k are related to the true system state, x, 
at time t k by equation (lb) where H k is the observation operator, E denotes the expectation 
operator and s k , with covariance matrix R k , is the observation error vector. The Kalman gain 
matrix, K k , dictates how the observations and model forecast are weighted in the analysis 
computation (equation Id). It depends on H k , R k and the background error covariance matrix, 

P k =E((x < -x{)(x' -x{) T ). (2) 

Of course, since x f is unknown, P k cannot be computed directly from equation (2) and must be 
estimated, either explicitly or implicitly, by some other means. In fact, the procedure used to 
estimate P k can be used to classify data assimilation methods. 

In most EDA methods, P k is estimated from the statistical distribution of an ensemble of model 
forecasts, 

x( k =M(x. k _ l ,f k _ l ), i = l,...,n, (3) 

started from an ensemble of n analyzed model states at the previous analysis time, t k -i. 
Following Houtekamer and Mitchell (2001), many EDA systems filter spurious long-range 
covariances resulting from finite ensemble sizes by (dropping the k subscript) decomposing P as 

P = P e *C, (4) 

where P e represents the background covariances estimated from the ensemble of model states, C 
is a compactly supported correlation matrix and • denotes the Schur (i.e., element by element) 
product of two matrices. 

In a class of methods known alternatively as ensemble optimal interpolation (EnOI: e.g., 
Borovikov et al. 2005; Oke et al. 2005, 2010; Wan et al. 2010; Vernieres et al. 2012) or 
asymptotic ensemble filters, the time dependency is neglected and P is estimated from the 
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157 statistics of one or more model run histories or from combinations of model histories. In many 

158 cases, EnOI methods are competitive with the flow-dependent EDA methods because they make 

159 up for the perfonnance degradation due to neglecting the forecast-error evolution by estimating 

160 error statistics from a much larger ensemble. 

161 

162 Optimal interpolation (01: Eliassen 1954) refers to an older class of data assimilation methods in 

163 which background error covariances are modeled with Gaussian functions or other analytically 

164 or empirically derived functions. Cross-field covariances are generally neglected in these 

165 methods and only the model field corresponding to the observed variable is updated. 

166 

167 2.2 Space Adaptive Forecast error Estimation (SAFE) 

168 The SAFE algorithm attempts to combine the simplicity and cost effectiveness of 01 with the 

169 large sample size of EnOI and the flow dependency of the EnKF. It estimates background error 

170 covariances by treating the state variables in neighboring grid cells surrounding every model grid 

171 point as if they were the state variables of other ensemble members at the same grid point. 

172 Because the size of the neighborhood determines the covariance amplitudes, rescaling is 

173 necessary. Note however that an error-covariance rescaling step is also implicitly present in 

174 many EDA methods where the background error covariance amplitude is detennined by 

175 parameters of a covariance inflation procedure. 

176 

177 To facilitate the procedure in geophysical fluid models with complicated boundaries, the 

178 following algorithm is used. For simplicity of notation, we assume that the model state can be 

179 split according to 

180 


181 


x = [v,w]. 




( 5 ) 


182 

183 where v is an observed model field and w is unobserved. The generalization to more than two 

184 model fields is obvious. We also assume that all the data assimilated correspond to the same 

185 model quantity although the generalization to different observation types is also straightforward. 

186 In view of the above, the model update is split according to 

187 



v a = v f + P w U T [ffP w ff T + r] ’ [y- //(v 7 )]. 

(6 a) 

188 

Av 

W a =w f + P m Il r [nP w Il r + r] '[.y- //(»/)], 

(6b) 


= m/+P wv (P vv )“ 1 Av. 

(6c) 


189 

190 The application of equation (6c) is further simplified by assuming that the w background error in 

191 grid cell (i, j, k ) is predominantly related to the v error in grid cell (i, j, k ) and negligibly related 

192 to the v errors in other grid cells, thus neglecting the off diagonal elements of P vv in (6c). Instead 

193 the unobserved model field is updated according to 

194 
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195 


pwv 

r gk 


(6 d) 


"V = v V + 


Av ijk> j = h—,J, k = \—,K, 


pk 


196 


197 where /, J and /f denote the number of grid cells along the x, y, and z space dimensions, 

198 respectively. Heuristically, these simplifications are related to the assumption that if a and b are 

199 correlated and b and c are correlated, then a and c are correlated. 

200 
201 

202 The first step is to estimate the background error variance of the observed field (the procedure is 

203 the same regardless whether this field is prognostic or diagnostic) with 

204 

205 ff;=rfiag(P w ) = 0([v-0(v)] 2 ), (7) 

206 


207 where 0 is a local 3D averaging operator. For our implementation, repetitive application of a 

208 gridpoint (spatial) Laplacian smoother was found to be effective. The results of Section 3 

209 (Figure 1) indicate that the size of the regions over which the averaging is applied is of little 

210 consequence. 

211 

212 The variance estimate is rescaled such that 

213 

214 \\diag(HP vv H')\\ = y 2 \\diag(R)\\, (8) 

215 


216 where the double vertical bar stands for an L2 vector norm. The parameter y is prescribed. It is 

217 a scalar representing the global mean (asymptotic) target ratio of background error variances to 

218 data error variances and its role is similar to that of multiplicative covariance inflation 

219 parameters used in many EDA applications. Note that this fonnulation assumes a steady state 

220 regime where the average global mean error variance increase between successive assimilations 

221 equals the mean error variance decrease resulting from each assimilation step. 

After estimating the background error variances, the update of equation (6a) is applied. This step 
corresponds to an OI analysis with the model background error variances calculated with 
equation (7). Let 7112 represent the covariance of the v background errors at locations 1 and 2. It 
is estimated with 


224 

225 

226 
227 


228 

229 

230 

231 

232 

233 

234 

235 


7li2 — < ^ v i < ^ v 2pl2 > (9fi() 

p 12 = c 0 (max(j- \v l -v 2 \,j r \x l -x 2 \ + Y-\y l -y 2 \ + j r \z l -z 2 1)), (9b) 

v 1 L x 1 y z 

where c v i and g V 2 are estimated with equation (7), the Ls are length scales in units of the 
variable v and in the three space dimensions and Co is the popular function given by equation 
(4.10) of Gaspari and Cohn (1999), or any other compactly supported correlation function. 
Alternatively, Euclidian distance can be used in the right hand side of equation (9b) at the 
expense of a slightly higher operation count. The max function selects the largest of its 
arguments. 
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236 

237 

238 

239 


Equation (9b) ensures that %\2 is 0 if either v; differs significantly from V 2 or if locations 1 and 2 
are very distant from each other. The intent is that in the majority of cases, 


240 

241 

242 

243 

244 

245 

246 


^12 — <7 vl <7 v2 C o((| V l V 2 |) ^ K )’ 

and the modulation of the background error covariances with the Co function enforces error 
covariance localization in a state-dependent manner. The formulation with the max function is 
pertinent to the ocean where strong gradients often coincide with zero correlation surfaces. In 
other applications, one can replace equation (9b) with 


247 

248 

249 

250 


n U = ° V F V 2 C o( i[ V r V 2) /L ) +((-*) --*2) 7 Z x) +((y i ~^) /Z y) +((*, ~ * 2 ) / Z *) 


The local cross-field covariances of the v and w errors in every grid cell are estimated with 


25 1 o 2 vw = 0([v - 0(v)][w - 0 (h>)]). (10) 

252 

253 They are used to update the fields of unobserved variables according to equation (6b-d). 

254 

255 2.3 Flow Adaptive error Statistics from a Time series (FAST) 

256 Unlike SAFE which uses the spatial distribution of model variables to estimate error covariances, 

257 FAST computes the analysis increment at time tk from n previous instances of the model state 

258 vector sampled from the recent history of the current model run, 


259 


X k={ x k-j- x (k)’ 7 = 0,..., «-l}, (11a) 


260 


*(*) - 


Z n— 

M x *-p 


(11 b) 


261 

262 

263 where x k = x(t k ), X kA = x(t k ] =t k ~z), etc., for a given time lag z. Arguably, r should be such 

264 that Xk -1 differs significantly from Xk while it still contains information that is useful at tu- For 

265 simplicity, z is set to the assimilation interval in this study. 

266 

267 While one could attempt to compute the analysis from Xk without further preprocessing as 

268 though it were made of the current state of each member of an ensemble of model trajectories, 

269 the resulting error covariance estimates would be dominated by the instances furthest away from 

270 the center of the time window since x (kj is the simple moving average of length n estimated at 

271 time t k _ nl2 . To prevent this from occurring and improve the assimilation performance, the lagged 

272 state instances are first high-pass filtered and then resampled to remove the remaining sequential 

273 ordering information. 

274 

275 The high-pass filtering takes the form 


7 


j = 0,...,n-\\ (12) 


276 

277 

278 


309 

310 

311 

312 

313 

314 




'^k-j % k~j , 


279 where the sequence of XL is an exponential moving average (EMA) of the model state history, 

280 

281 x° k =ax k +(l-a)x“_ 1? (13) 

282 

283 where 0<a<l. A good choice to filter out time scales longer than half the sampling time 

284 window is <x = 4/(n + 2). The case with a = 0.5 is essentially equivalent to forming the ensemble 

285 of first order differences over the time window. 

286 

287 The resampling, 

288 


289 


290 




X,, : = 


*r= k 


x". 


7 = 0, 
7=0, 


4 


(14a) 

(146) 


29 1 uses weights, fiy, drawn from a uniform random distribution. 

292 

293 FAST makes the same calculations to estimate background error covariances and compute 

294 assimilation increments with the ensemble of deviations from equation (14b) as the EnKF makes 

295 with its ensemble of model states at time tk (eg-, equation 2b-f of Keppenne et al. 2008). One 

296 notable difference is that FAST calculates only one increment. Because a single model 

297 integration is involved, the ensemble size ( n ) can be increased at a very minimal cost 

298 

299 2.4 GEOS-5 Modeling and Ocean Data Assimilation System 

300 2.4.1 GEOS-5 atmosphere-ocean general circulation model 

301 The SAFE and FAST algorithms are tested in Section 3 in the context of assimilating Argo 

302 temperature data into the GFDF MOM4.1 ocean model coupled to the NASA GEOS-5 AGCM 

303 and to the Eos Alamos CICE ice model (all of which comprise the GEOS-5 AOGCM). The 

304 model configuration is the same as that used for the GMAO ocean analysis (Vemieres et al. 

305 2012). In summary, the OGCM is run with a geopotential vertical coordinate on a V 2 0 grid with a 

306 gradual meridional refinement to %° at the Equator and with 40 vertical levels. The grid is 

307 Cartesian south of 60°N and tripolar northward thereof. The AGCM grid is 1° x 1.25° with 72 

308 levels. The CICE model is run on the same horizontal grid as the OGCM. The AGCM is 
constrained by replaying the Modern-Era Retrospective analysis for Research and Applications 
(MERRA: Rienecker et al. 2011) while the ocean observations are assimilated. The replay 
procedure replaces the AGCM state with the state of the analysis every six hours. 


2.4.2 GEOS integrated ocean data assimilation system (iODAS) 

The components of the GEOS-5 AOGCM are connected to each other and to the GEOS 

315 integrated ocean data assimilation system (iODAS) with the Earth System Modeling Framework 

316 (ESMF). Besides SAFE and FAST, an EnOI utilizing a steady state ensemble of forecast-error 

317 estimates is used in Section 3 as a comparison benchmark. The parallel implementation of 

318 iODAS follows Keppenne and Rienecker (2003). 


319 

320 

321 

322 

323 

324 


SAFE. FAST and EnOI background error covariances are localized according to equation (4) 
where the element of C corresponding to the zth and /th model state variables at space-time 
locations (x h y„ z„ t,) and (xj, yj, zj, tj), is given by 


=c 0 (max(f \r t -r, \Mx i - x,| + j-\y, ~y\ + ^-\z i -z ; .|))c 0 (f \t t -t . J), (15) 


325 

326 where r, and rj are the adaptive localization variable at locations i and j and the r field is the 

327 observed variable. Note the similarity with equation (9b), except for the appearance of the 

328 temporal term, cA-^-n. -t- ). The latter results from differences between the measurement times 

329 and the analysis time. The application of equation (15) to modulate the background error 

330 covariances enforces a state-dependent error-covariance localization, even when the raw 

33 1 covariances are time-independent, as is the case with EnOI. 

332 

333 

334 3. Application 

335 To validate SAFE and FAST, we ran four AOGCM experiments assimilating T profiles from the 

336 broad-scale global array of temperature/salinity profiling floats (Argo: Gould et al. 2004). In the 

337 SAFE, FAST and EnOI runs, both T and S ocean model fields are updated. As in the GMAO 

338 production ocean analysis (Vernieres et al. 2012), the EnOI background error covariances are 

339 computed from the leading 20 EOFs (with the corresponding ensemble mean removed) of an 

340 ensemble of 186 short-term forecast-error estimates from coupled GEOS-5 forecasts. TOI is an 

341 univariate OI run in which the background error variance corresponds to the cumulative variance 

342 of the EOFs used in the EnOI run. Note tha the TOI run is included for completeness, even 

343 though it is known that assimilation that does not update salinity carefully can give a poor 

344 analysis (e.g., Sun et al. 2007). Besides the assimilated Argo T data, unassimilated Argo S data 

345 are used for validation. A control run without data assimilation was also run. 

346 

347 The runs cover a two-year period starting January 1, 2010. The ocean initial conditions are the 

348 same for all runs and come from the GMAO ocean analysis (Vernieres et al. 2012). The GEOS-5 

349 replay procedure constrains the atmosphere to MERRA over the period of the runs. Every five 

350 days, data from a 5-day time window centered about the analysis time are processed. The 

35 1 operator H is a 4-dimensional interpolation operator to the time and location of the observations. 

352 The observational error model is vertically Gaussian to reflect correlated errors in each ARGO 

353 profile and the absence of error correlations between distinct profiles. The observational error 

354 variance varies as a function of depth according to the magnitude of the vertical gradient. 

355 Details are provided in Vernieres et al. (2012). The assimilation increments are applied 

356 incrementally over a five-day period, as in the incremental analysis update procedure of Bloom 

357 et al. (1996), but without rewinding the model clock (Keppenne et al. 2008). 

358 

359 The SAFE run estimates its background error covariances from equations (7) and (10) where the 

360 0 operator consists of 10 diffusion steps. To improve the perfonnance in the low latitudes, 

361 SAFE error covariances are explicitly disabled when they involve a grid cell within the 10°N- 

362 10°S latitude band and another grid cell outside of it. This step is exclusively applied in the 

363 SAFE run to prevent the state variables at grid cells outside the waveguide from participating in 
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364 the estimation of the background error variance (0 operator) at grid cells inside the waveguide. 

365 The FAST run applies equations (11-14) with a five-day lag, n = 20 and a=0.18. Only 20 lags are 

366 used to facilitate comparison with EnOI, since the latter uses a static ensemble of 20 leading 

367 EOFs. The error-covariance localization scales (Zs in equation 15) are the same in all runs and 

368 are identical to those used in Vemieres et al. (2012). 

369 

370 Figure 1 shows that varying the size of the neighborhood used in the SAFE background error 

371 estimation (number of 0 smoothing iterations in equation 7) has little effect on the performance 

372 of the SAFE algorithm. It shows the evolution of global RMS OMF reduction from the 

373 corresponding RMS OMF from the control run without data assimilation, such that negative 

374 numbers indicate that the analysis is closer to the data than the control. Fig. la corresponds to 

375 the assimilated T data and Fig. lb to the unassimilated S data. The three cases shown correspond 

376 to 5 (red), 10 (blue) and 20 (green) iterations of a Laplacian filter. While the case with 20 

377 iterations produces a somewhat larger RMS S OMF reduction, the differences from the other two 

378 cases are small. 

379 

380 Figure 2 illustrates how high-pass filtering and resampling the sequence of background states 

381 from which FAST estimates error covariances affects the assimilation performance. It shows the 

382 global RMS OMF reduction from the control for both T and S in five cases. The green lines 

383 correspond to the full FAST methodology (equations 11-14) with n= 20 and a=0.18 (period 10 

384 EMA). The four other cases shown correspond to (1) the deviations from their ensemble mean 

385 of the most recent 20 unfiltered background states sampled every five days (magenta), (2 and 3) 

386 the deviations from their ensemble mean of the most recent 20 first order time differences (cyan) 

387 and second-order time differences (blue) of background states sampled every five days and (4) 

388 the EnOI run (red). Clearly, computing covariances from unfiltered background states, a 

389 procedure which corresponds to using signal covariances, results in the poorest performance for 

390 S data even though it draws the model state closest to the T data. The performance obtained with 

391 the dynamic ensembles of most recent first and second order time differences is close to that 

392 obtained with the static ensemble of leading EOFs. FAST with 50-day high-pass filtering 

393 (period- 10 EMA removal from a time series with L = 5 days) performs best for S and achieves a 

394 good compromise for T. Presumably, the 50-day filtering retains pertinent information and 

395 avoids aliasing to the lower frequencies but it is possible that better results could be obtained 

396 with a different high-pass period. 

397 

398 To illustrate the SAFE and FAST error covariance models, Figure 3 shows time sequences of 

399 zonal vertical cross sections at the Equator through the SAFE (Fig. 3 a-d) and FAST (Fig. 3 e-h) 

400 background error standard deviation estimates for the model’s ocean temperature. The succession 

401 is shown with a 3-month interval. The FAST and SAFE sections are qualitatively similar. Yet, 

402 the SAFE estimates are noticeably smoother because the number of grid cells participating in the 

403 SAFE spatial averaging is larger than the number of lagged state instances used in the FAST 

404 computations. Also note the general resemblance to the corresponding section through the time- 

405 independent background error standard deviation field used by EnOI and TOI (Fig. 3i). The 

406 differences between the equatorial sections are largest in the Indian and Atlantic Ocean. 

407 

408 The processing time of each run with data assimilation expressed in terms of the time taken by 

409 the control run on 30 Intel Altix Sandy Bridge nodes (360 2.8 GHz cores) is shown in Figure 4. 
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410 TOI takes 70% longer than the control run while FAST and EnOI both take about twice as long 

411 as TOI and SAFE takes nearly 50% longer than TOI. For comparison, the best case scenario for 

412 a 20-member ensemble run in which ensemble members are run sequentially is also shown. 

413 Running ensemble members in parallel, while possible with the GEOS iODAS would require 

414 many more compute nodes. 

415 

416 Figure 5 illustrates the background error covariance models used in each run by showing 

417 marginal T and S assimilation increments corresponding to the impact of a unit T innovation at 

418 (0°N, 140°W, 180m) at the end of the runs (January 1, 2012). The top row of panels (a), (e) and 

419 (i) shows zonal sections through the corresponding marginal T increments in the SAFE (left), 

420 FAST (middle) and EnOI (right) runs. The 2 nd row of panels (b), (f) and (j) shows corresponding 

421 meridional T sections. Panels (c), (g) and (k) (3 ld row) and the bottom row of panels (d), (h) and 

422 (1) show zonal and meridional sections through the corresponding marginal S increments. 

423 

424 The differences apparent in Figure 5 result primarily from differences in covariance modeling 

425 approach (static ensemble in EnOI, time-lagged ensemble in FAST, spatial covariance in SAFE). 

426 However, differences also arise from differences in the state adaptive error-localization of 

427 equation (15) since the differences between the respective background states have increased over 

428 time (particularly evident in Figure 6). The amplitude differences between the SAFE, FAST and 

429 EnOI marginal gains reflect differences in the background error estimates at the observation 

430 location. In this example, there is more correspondence between the shapes of the marginal T 

43 1 and S increments from the EnOI (panels (i) and (k) and panels (j) and (1)) than those from SAFE 

432 or FAST. The amplitude of the T marginal increment is also largest in the EnOI run. Yet, the 

433 amplitude of the S marginal increment is relatively small in the EnOI run, reflecting lower 

434 covariance between the T and S error estimates at this particular observation location. 

435 

436 To further illustrate how the SAFE, FAST and EnOI error-covariance models differ, Figure 6 

437 shows the time evolution (sampled every three months) of zonal sections through the marginal S 

438 increment corresponding to a unit T innovation at the same Equatorial location considered in 

439 Figure 5. Not surprisingly since the EnOI estimates background covariances from a static 

440 ensemble, its marginal S gain at this location displays the least temporal variation. The latter 

441 result from how the background T field ( r in equation (15)) changes with time. Conversely, the 

442 FAST marginal S gain varies the most with time as one could have expected because the 

443 corresponding background error covariances are high pass filtered by design and represent 

444 errors/uncertainties at periods shorter than 50 days in this case. Clearly, the FAST covariances 

445 are influenced by tropical instability waves which mostly occur between July and November and 

446 have wavelengths of 1000-2000 km and periods of 20-40 days (e.g., Willett et al., 2006). While 

447 the SAFE background error covariance calculations also depend on the background fields, the 

448 resulting covariances only capture variability in space, not in time. 

449 

450 Figure 7 quantifies the improvement (negative values) or worsening (positive values) over the 

451 control by showing to what extent the RMS OMF statistics differ from the corresponding 

452 statistics from the control run. RMS OMF differences are shown in each panel for the SAFE 

453 (blue), FAST (red), EnOI (green) and TOI (magenta) runs. Figure 7a corresponds to the 

454 assimilated Argo T data, while Figures 7b and 7c correspond to the unassimilated Argo S data 

455 above and below 300 meters. While the four data assimilation methods perform similarly for T, 
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456 FAST stands out for its better performance in terms of S, especially in the upper ocean (Fig. 7b). 

457 On the other hand, the underperformance of TOI, which degrades the model salt field compared 

458 to the control run, is especially striking in the thermocline (Fig. 7c). 

459 

460 The global RMS observation minus forecast (OMF) differences corresponding to the T data are 

461 comparable in the four runs with T data assimilation (SAFE: 0.76 °C, FAST: 0.88 °C, EnOI: 0.76 

462 °C, TOI: 0.87 °C), as they each explain approximately the same fraction of the T innovation 

463 variance of the control run (1.27 °C“). This result is as expected given that each run sets y=l in 

464 equation (8) to facilitate the comparison. Figure 8 further illustrates the respective performance 

465 of each run with T assimilation. The difference of the RMS OMF (horizontally and over time) in 

466 the data assimilation runs from that in the control is shown as a function of depth for 2011 (blue: 

467 SAFE, red: FAST, green: EnOI, magenta: TOI). Negative numbers mean that the data 

468 assimilation brings the (5-day lead) forecast state closer to the data than the control and should 

469 be the norm if the data are unbiased. Figure 8a corresponds to the assimilated T data and Figure 

470 8b to the unassimilated S data. For T, the level of improvement over the control is similar for all 

471 runs and is largest near a depth of 100 meters. For S, the results are markedly different. TOI is 

472 worse than the control over the entire water column and while SAFE, FAST and EnOI all 

473 improve upon the control over the entire column, FAST produces the largest improvement over 

474 the entire depth range. 

475 

476 The horizontal distributions of the differences in RMS S OMF from those of the control during 

477 2011 for each of the SAFE, FAST, EnOI and TOI runs are shown in Figure 9 for the upper 300 

478 meters and in Figure 10 for depths greater than 300 meters. In the upper ocean, SAFE, EnOI and 

479 TOI all show significant degradations from the control in the Western Equatorial South Pacific 

480 (red areas in Figs. 9a, 9c, and 9d). FAST does better in the same area and performs best overall 

481 (Fig. 7b). Since the upper ocean salt content is heavily influenced by precipitation and 

482 evaporation and the corresponding fluxes are constrained to the MERRA forcing in all runs, 

483 including the control, it is not surprising that the analyses (which all assimilate T only) do not 

484 outperform the control at the surface and in the mixed layer. Positive impacts on the model 

485 salinity from the T data assimilation are most likely to manifest themselves further away from 

486 the surface. Accordingly, the positive impact of the S field correction in the SAFE, FAST and 

487 EnOI runs is more apparent below 300 meters, especially in the Northern Atlantic, Guff Stream 

488 and Kuroshio areas and in the area of the West Australian and Leeuwin currents in the Southeast 

489 Indian Ocean. While FAST performs best overall, it under-performs the control in the Indian 

490 sector of the Southern Ocean. Since the comparison is restricted to 2011, these regional 

491 comments are not definitive. 

492 

493 

494 4. Outlook 

495 When EDA schemes are applied to complex numerical models, the ensemble size is always a 

496 limiting factor or the object of compromise. The methodologies introduced here are designed to 

497 possess the main advantages of EDA methods, namely the ability to update state variables even if 

498 unobserved (or not directly assimilated) and to adaptively estimate the spatial distribution of 

499 background errors, without incurring the cost of ensemble integrations. 

500 

501 While SAFE is nearly as economical as conventional OI, our results hint that it is somewhat less 
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502 effective as FAST or EnOI in updating fields of unobserved variables. The better performance of 

503 FAST in this respect may stem in part from its error covariance model ability to capture sub- 

504 seasonal variability and in part from the fact that it does not rely on the type of heuristic 

505 assumption made with SAFE between equations (6c) and (6d). 

506 

507 Of course, nothing precludes one from using FAST or SAFE to boost the ensemble size of an 

508 EDA scheme. SAFE background error estimates can be combined with those obtained with a 

509 dynamical ensemble as is usually done with OI covariances in hybrid EDA schemes. Several 

510 FAST trajectories can be run concurrently and the resulting time lagged ensembles combined 

511 into a single ensemble. Another area where SAFE and FAST seem to hold promise is in complex 

512 production systems where running an EDA scheme would require that the ensemble size or 

513 model resolution be severely limited, and in high-resolution data assimilation applications where 

514 numerical cost is critical. To illustrate this, we increased the MOM and CICE horizontal 

515 resolution to a 0.1° global tripolar grid with gradual meridional refinement to 0.05° and the 

516 GEOS-5 AGCM resolution to 0.25° x 0.3125°, while keeping the number of verticals levels 

517 unchanged (MOM/CICE: 40, AGCM: 72). We then started running the high resolution CGCM 

518 on 960 2.8 GHz Altix Sandy Bridge cores with a 5-minute time step replaying the MERRA 

519 reanalysis in its AGCM component and initializing its OGCM component with a horizontally 

520 constant hydrostatic equilibrium condition. Each day, a multi-scale (bi-scale) ocean analysis 

521 took place. First, T, S and current fields from the 0.5° GMAO ocean analysis (Vernieres et al. 

522 2012) were assimilated into the 0.1° global OGCM using SAFE and updating only the fields of 

523 observed variables. The covariance localization scales were the same as those used to produce 

524 the ocean analysis in this step. Following the assimilation of the 0.5° production analysis, the 

525 0.1° temperature analysis was refined by using SAFE to assimilate daily 0.25° Reynolds (2007) 

526 SSTs, shortening the horizontal localization scales to one fifth of the production analysis values. 

527 SAFE was used because FAST would have required the availability of past background states. 

528 One could choose to continue the analysis with FAST after the initial spin up. 

529 

530 Figures 11 and 12 illustrate the rapid convergence of the ocean surface conditions from the 

531 multi-scale ocean analysis to the Reynolds data. They show details of the SST field on August 

532 27, 2007, 27 days into the run. In each of Figures 11 and 12, panel (a) correspond to the 0.1° 

533 analysis, panel (b) shows the 0.25° Reynolds SST data and panel (c) shows the corresponding 

534 detail from the 0.5° production analysis. Had one wanted to produce such a fine analysis with 

535 EDA, the computational resource requirement would have been overwhelming (about 1 hour of 

536 wall clock time per simulation day per ensemble member on 960 cores). 

537 

538 
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608 

609 Figure captions 

610 Figure 1. Reduction of SAFE RMS OMF over the corresponding RMS OMF from the control 

611 run without data assimilation for (a) assimilated Argo T and (b) unassimilated Argo S data. The 

612 three cases shown correspond to SAFE runs in which the background error covariance estimation 

613 involves 5 (red), 10 (blue) and 20 (green) steps of a diffusive (Laplacian) fdter. Negative (vs. 

614 positive) values correspond to improvements (vs. worsening) over the control. 

615 

616 Figure 2. Reduction of RMS OMF over the corresponding RMS OMF from the control run 

617 without data assimilation for (a) assimilated Argo T and (b) unassimilated Argo S data in runs 

618 assimilating the Argo T data every five days and in which the background error covariances are 

619 estimated with either EnOI using a static ensemble of 20 leading error EOFs (EnOI: red), a 

620 lagged ensemble of the 20 most recent unfiltered background states (0 order: magenta), an 

621 ensemble of the 20 most recent first-order time differences (1 st order: cyan), an ensemble of the 

622 20 most recent second-order time differences (2 nd order: blue), or FAST with 20 lags and 50-day 

623 high pass filtering (FAST: green). Negative (vs. positive) values correspond to improvements 

624 (vs. worsening) over the control. 


625 Figure 3. Temperature background error standard deviation estimates along the Equator in the 

626 SAFE, FAST and EnOI runs of Section 3 and corresponding from top to bottom to March 31, 

627 2011 (a: SAFE, e: FAST), June 30, 2011 (b: SAFE, f: FAST), September 30, 2011 (c: SAFE, g: 

628 FAST) and December 31, 2011 (d: SAFE, h: FAST) Panel (i) shows the time independent 

629 background error standard deviation estimate used by both the EnOI and TOI runs. The color 

630 scale shown to the right of panel (i) is applicable for all panels. 

631 

632 Figure 4. Processing time per month of model simulation expressed in units of the 

633 corresponding processing time from the control run. Note the logarithmic scale. The EnKF case 

634 corresponds to a best case scenario for a 20-member EnKF run in which ensemble members are 

635 run sequentially. 

636 

637 Figure 5. Zonal and meridional sections through the marginal contribution to the T and S 

638 assimilation increments in PSU corresponding to a unit T innovation at (0°N, 140°W, 180m) in 

639 the SAFE (a-d), FAST (e-h) and EnOI (i-1) runs on January 1, 2012. Zonal (meridional) sections 

640 are labeled W-E (S-N). (a), (e), (i) correspond to T zonal sections, (b), (f), (j) to T meridional 

641 sections, (c), (g), (k) to S zonal sections and (d), (h), (1) to S meridional sections. The top color 

642 bar applies to all the panels in the top two rows. The bottom color bar applies to the bottom two 

643 rows. 

644 

645 Figure 6. Zonal sections through the marginal contribution to the S assimilation increment in 

646 PSU corresponding to a unit T innovation at (0°N, 140°W, 180m) in the SAFE (a-e), FAST (f-j) 

647 and EnOI (k-o) runs on (from top to bottom) January 1, 2010, April 1, 2010, July 1, 2010, 

648 October 1 , 20 10 and January 1 , 20 1 1 . The color bar to the right applies to all the panels. 

649 

650 Figure 7. (a) RMS OMF difference with RMS OMF from the control run without data 

651 assimilation for (a) assimilated Argo T data, (b) unassimilated Argo S data in the upper 300 
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652 meters and (c) unassimilated Argo S data below 3000 meters. RMS OMF differences quantify 

653 the improvement (negative values) or worsening (positive values) over the control and are shown 

654 in each panel for the SAFE (blue), FAST (red), EnOI (green) and TOI (magenta) runs. 

655 

656 Figure 8. Global average of RMS OMF over the control as a function of depth for (a) 

657 assimilated T data and (b) unassimilated S data in the second year (20 1 1) of the SAFE (blue), 

658 FAST (red), EnOI (green) and TOI (magenta) runs. Negative (positive) numbers indicate a 

659 reduction (increase) in RMS OMS statistics over the control run. 

660 

661 Figure 9. Horizontal distribution of RMS OMF differences for the unassimilated S data during 

662 2011 with the corresponding RMS OMF from the control run. The data are binned over 0-300- 

663 meter deep by 1° zonal by 1° meridional boxes. Negative values identify areas where the 

664 analysis is closer to the Argo observations than the corresponding state from the control run and 

665 vice versa. The four panels correspond to the SAFE (a), FAST (b), EnOI (C) and TOI (d) runs. 

666 

667 Figure 10. Same as Figure 9 for the Argo S observations below 300 meters. 

668 

669 Figure 11. Eastern equatorial pacific detail of SST field on August 27, 2007 in (a) the high- 

670 resolution 0.1° multi-scale global ocean analysis, (b) the 0.25° Reynolds SST data set assimilated 

671 in the second step of each daily multi-scale assimilation and (c) the 0.5° GMAO ocean analysis 

672 assimilated in the first-step of the multi-scale procedure. 

673 

674 Figure 12. Same as Figure 1 1 for the western north Pacific east of Japan. 
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